Skip to content

Conversation

@maki49
Copy link
Collaborator

@maki49 maki49 commented Jan 16, 2025

No description provided.

@maki49
Copy link
Collaborator Author

maki49 commented Jan 16, 2025

The bug occured in the integration test seems strange:
image

I tried to debug it on my machine and find that:

  • It happens at the second call of complex eigensolver pzheev
  • Compiling with intel oneapi has no such problem. It only happends with GNU compiler.
  • Running with 1 processor encounters the same problem.
  • Both jobtype = 'N' and V have such problem.
  • pzheevd and pzheevx also have such problem.
  • Enlarging work and rwork to 100 times of the queried value of lwork and rwork does not help.

gdb info:

Thread 1 "abacus" received signal SIGSEGV, Segmentation fault.
0x00001555447c8cbb in mkl_lapack_zladiv () from /opt/intel/oneapi/mkl/2024.2/lib/libmkl_core.so.2
(gdb) bt
#0  0x00001555447c8cbb in mkl_lapack_zladiv ()
   from /opt/intel/oneapi/mkl/2024.2/lib/libmkl_core.so.2
#1  0x0000555555f5bb2a in pzlarfg_.constprop ()
#2  0x0000555555f6736b in pzlatrd_.constprop ()
#3  0x0000555555f5aa8c in pzhetrd_.constprop ()
#4  0x0000555555f0637a in pzheev_ ()
#5  0x0000555555e3e5f0 in LR_Util::diag_scalapack (
    n=@0x7fffffff89a0: 80, mat=mat@entry=0x555559120830, 
    eigval=0x555558a4a560, eigvec=eigvec@entry=0x555559139840, 
    desc=...)
    at /home/fortneu49/abacus-fix/abacus-develop/source/module_lr/utils/lr_util.cpp:222

@caic99 @dyzheng do you have any idea?

@mohanchen mohanchen added EXX and lr-TDDFT Related to EXX or lr-TDDFT Refactor Refactor ABACUS codes labels Jan 17, 2025
@caic99
Copy link
Member

caic99 commented Jan 23, 2025

Hi @maki49 ,
Would you check all the input params (and its shape for arrays) with extra care? Like, does the pointer and ld matches?

@mohanchen
Copy link
Collaborator

It seems that all tests have passed, if the issue has been solved and the PR can be accepted, let me know.

@maki49
Copy link
Collaborator Author

maki49 commented Jan 23, 2025

It havn't been solved. I'm still debugging. Here are somes test results:

compiler solver 1 processor multi-processor
intel pzheevx OK segfault
gnu pzheevx OK segfault
intel pzhegvx OK segfault
gnu pzhegvx OK segfault
intel pzheev OK OK
gnu pzheev segfault segfault
intel pzheevd OK OK
gnu pzheevd OK usually OK but occasional segfault

@maki49 maki49 closed this Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

EXX and lr-TDDFT Related to EXX or lr-TDDFT Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants